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Abstract — Due to the use of commodity software and hardware, 
crash-stop and Byzantine failures are likely to be more prevalent 
in today's large-scale distributed storage systems. Regenerating 
codes have been shown to be a more efficient way to disperse 
information across multiple nodes and recover crash-stop failures 
in the literature. In this paper, we present the design of regen- 
eration codes in conjunction with integrity check that allows 
exact regeneration of failed nodes and data reconstruction in 
presence of Byzantine failures. A progressive decoding mecha- 
nism is incorporated in both procedures to leverage computation 
performed thus far. The fault-tolerance and security properties 
of the schemes are also analyzed. 

Index Terms — Network storage, Regenerating code, Byzantine 
failures, Reed-Solomon code, Error-detection code 

I. Introduction 

Storage is becoming a commodity due to the emergence 
of new storage media and the ever decreasing cost of conven- 
tional storage devices. Reliability, on the other hand, continues 
to pose challenges in the design of large-scale distributed 
systems such as data centers. Today's data centers operate 
on commodity hardware and software, where both crash- 
stop and Byzantine failures (as a result of software bugs, 
attacks) are likely the norm. To achieve persistent storage, one 
common approach is to disperse information pertaining to a 
data file (the message) across nodes in a network. For instance, 
with (n, k) maximum-distance-separable (MDS) codes such as 
Reed-Solomon (RS) codes, data is encoded and stored across 
n nodes and, an end user or a data collector can retrieve the 
original data file by accessing any k of the storage nodes, a 
process referred to as data reconstruction. 

Upon failure of any storage node, data stored in the failed 
node needs to be regenerated (recovered) to maintain the 
functionality of the system. A straightforward way for data 
recovery is to first reconstruct the original data and then 
regenerate the data stored in the failed node. However, it is 
wasteful to retrieve the entire B symbols of the original file, 
just to recover a small fraction of that stored in the failed node. 
A more efficient way is to use the regenerating codes which 
was innoduced in the pioneer works by Dimakis et al. in (T), 
J2J. A tradeoff can be made between the storage overhead 
and the repair bandwidth needed for regeneration. Minimum 
Storage Regenerating (MSR) codes minimize first, the amount 
of data stored per node, and then the repair bandwidth, 



while Minimum Bandwidth Regenerating (MBR) codes carry 
out the minimization in the reverse order. The design of 
regenerating codes have received much attention in recent 
years l3l- lfT0l . Most notably, Rashi et al. proposed optimal 
exact-Regenerating codes using a product-matrix reconstruc- 
tion that recover exactly the same stored data of the failed node 
(and thus the name exact-regenerating) iflOll . Existing work 
assumes crash-stop behaviors of storage nodes. However, with 
Byzantine failures, the stored data may be tampered resulting 
in erroneous data reconstruction and regeneration. 

In this paper, we consider the problem of exact regeneration 
for Byzantine fault tolerance in distributed storage networks. 
Two challenging issues arise when nodes may fail arbitrarily. 
First, we need to verify whether the regenerated or recon- 
structed data is correct. Second, efficient algorithms are needed 
that incrementally retrieve additional stored data and perform 
data-reconstruction and regeneration when errors have been 
detected. Our work is inspired by [10] and makes the following 
new connibutions: 

• We present the detailed design of an exact-regenerating 
code with error correction capability^ 

• We devise a procedure that verifies the correctness of 
regenerated/reconstructed data. 

« We propose progressive decoding algorithms for data- 
reconstruction and regeneration that leverages computa- 
tion performed thus far. 

The rest of the paper is organized as follows. We give an 
overview of regenerating codes and RS codes in Section ITTl to 
prepare the readers with necessary background. The design of 
error-correcting exact regenerating code for the MSR points 
and MBR points are presented in Section [TTTJ and Section [TV] 
respectively. Analytical results on the fault tolerance and 
security properties of the proposed schemes are given in 
Section [V] Related work is briefly surveyed in Section [VI] 
Finally, we conclude the paper in Section IVIII 



'The encoding process is the same as that given in |10| except that an 
explicit encoding matrix is given in this work. 
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II. Preliminaries 



Hence <(2j and become 



A. Regenerating Codes 

Regenerating codes achieve bandwidth efficiency in the 
regeneration process by storing additional symbols in each 
storage node or accessing more storage nodes. Let a be the 
number of symbols over finite field GF(q) stored in each 
storage node and (3 < a the number of symbols downloaded 
from each storage during regeneration. To repair the stored 
data in the failed node, a helper node accesses d surviving 
nodes with the total repair bandwidth d(3. In general, the 
total repair bandwidth is much less than B. A regenerating 
code can be used not only to regenerate coded data but also 
to reconstruct the original data symbols. Let the number of 
storage nodes be n. An [n,k,d] regenerating code requires 
at least k and d surviving nodes to ensure successful data- 
reconstruction and regeneration fTol . respectively. Clearly, 
k < d <n- 1. 

The main results given in 13, J5) are the so-called cut-set 
bound on the repair bandwidth. It states that any regenerating 
code must satisfy the following inequality: 



fc-i 



B < ^min{a,(d- i)(3} 



(1) 



Minimizing a in (I) results in a regenerating code with 
minimum storage requirement; and minimizing (3 results in 
that with minimum repair bandwidth. It is impossible to have 
minimum values both on a and (3 concurrently, and thus 
there exists a tradeoff between storage and repair bandwidth. 
The two extreme points in (HJ are referred as the minimum 
storage regeneration (MSR) and minimum bandwidth regen- 
eration (MBR) points, respectively. The values of a and (3 for 
MSR point can be obtained by first minimizing a and then 
minimizing (3: 



B 

T 



(3 



B 
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Reversing the order of minimization we have (3 and a for 
MBR as 



a 



IB 



k(2d -k + 1) 

2dB 
k(2d-k + l) 



(3) 



As defined in flOl . an [n, k, d] regenerating code with parame- 
ters (a, (3, B) is optimal if i) it satisfies the cut-set bound with 
equality, and ii) neither a and (3 can be reduced unilaterally 
without violating the cut-set bound. Clearly, both MSR and 
MBR codes are optimal regenerating codes. 

It has been proved that when designing [n, k, d] MSR or 
MBR codes, it suffices to consider those with (3 = 1 iflOl . 
Throughout this paper, we assume that (3 = 1 for code design. 



and 



a = d — k + 1 

B = k(d-k + l) = ka 



a = d 

B = kd-k(k- 1)/2 



(4) 



(5) 



respectively, when (3 = 1. 

There are two ways to regenerate data for a failed node. If 
the replacement data generated is exactly the same as those 
stored in the failed node, we call it the exact regeneration. 
If the replacement data generated is only to guarantee the 
data-reconstruction and regeneration properties, it is called 
functional regeneration. In practice, exact regeneration is more 
desired since there is no need to inform each node in the 
network regarding the replacement. Through this paper, we 
only consider exact regeneration and design exact-regenerating 
codes with error-correction capabilities. 

B. Reed-Solomon codes 

Since Reed-Solomon (RS) codes will be used in the design 
of regenerating codes, we briefly describe the encoding and 
decoding mechanisms of RS codes next. 

RS codes are the most well-known error-correction codes. 
They not only can recover data when nodes fail, but also can 
guarantee recovery when a subset of nodes are Byzantine. RS 
codes operate on symbols of m bits, where all symbols are 
from finite field GF(2 m ). An [n,<i] RS code is a linear code, 
with parameters n = 2™ — 1 and n — d = 2t , where n is the 
total number of symbols in a codeword, d is the total number 
of information symbols, and t is the symbol-error-correction 
capability of the code. 

Encoding: Let the sequence of d information symbols 
in GF(2 m ) be u — [uq, u\, . . . , Ud-i] and u(x) be the 
information polynomial of u represented a^| 

u(x) = Uo + u\x + • • • + Ud-ix'^ 1 . 

The codeword polynomial, c(x), corresponding to u{x) can 
be encoded as 



c(x) = u{x)x 



n—d 



(u(x)x 



n—d 



mod g(x)) , 



(6) 



where g{x) is a generator polynomial of the RS code. It is 
well-known that g(x) can be obtained as 



{x ~ a b )(x - a b+1 ) 
9a + 9ix + g 2 x 2 + 



(x 



92tX 



.21 



(7) 



where a is a generator (or a primitive element) in GF(2 m ), b 
an arbitrary integer, and gi £ GF(2 m ). The RS code defined 
by (O is a systematic code, where the information symbols 
uq, Mi, . . . , Ud-i occur as coefficients (symbols) in c(x). 

2 We use polynomial and vectorized representations of information symbols, 
codewords, received symbols and errors interchangeably in this work. 
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Fig. 1 . Block diagram of RS decoding. Above each block, the corresponding existing algorithms are indicated. 



Another encoding method for RS codes is the encoder 
proposed by Reed and Solomon ifTTl . where the codeword 
c corresponding to the information sequence u is 

c = [u(a°), u(a 1 ), u(a 2 ), • • • , u(a" -1 )] . (8) 

When 6=1, the codes generated by (|6]l and ((8) are identical. 
In this work, we adopt the later encoding method. 

Decoding: The decoding process of RS codes is more 
complex. A complete description can be found in fl2l . 

Let r(x) be the received polynomial and r(x) = c(x) + 
e(x) + j(x) = c(x) + \(x), where e(x) = Y^j=o e i x ^ * s ^ e 
error polynomial, j(x) = 2?=o ^i x ^ tne erasure polynomial, 

71 1 

and \(x) = J2jZo ^j x ^ = &{x)+^{x) the errata polynomial. 
Note that g{x) and (hence) c(x) have a b , a b+1 , . . . , a b+2t ~ 1 
as roots. This property is used to determine the error locations 
and recover the information symbols. 

The RS codes are optimal as it provides the largest separa- 
tion among code words, and an [n, d] RS code can recover 
from any v errors as long as v < [ n ~%~ s \, where ,s is 
the number of erasure (or irretrievable symbols). The basic 
procedure of RS decoding is shown in Figure Q] The last 
step in this figure is not necessary if a systematic RS code 
is applied; otherwise, the last step of the decoding procedure 
involves solving a set of linear equations, and can be made 
efficient by the use of Vandermonde generator matrices lfl3l . 
The decoding that handles both error and erasure is called the 
error-erasure decoding. 

In GF(2 m ), addition is equivalent to bit-wise exclusive- 
or (XOR), and multiplication is typically implemented with 
multiplication tables or discrete logarithm tables. To reduce the 
complexity of multiplication, Cauchy Reed-Solomon (CRS) 
codes [ 14] have been proposed to use a different construction 
of the generator matrix, and convert multiplications to XOR 
operations for erasure. However, CRS codes incur the same 
complexity as RS codes for error correction. 

III. Encoding and Decoding of Error-Correcting 
Exact-Regenerating Codes for the MSR Points 

In this section, we demonstrate how to perform error cor- 
rection on MSR codes designed to handle Byzantine failures 
by extending the code construction in iflOl . It has been proved 
in (TO) that an MSR code C' with parameters [n',k',d'] for 
any 2k' — 2 < d! < n' — 1 can be constructed from an MSR 
code C with parameters [n = n' + i,k = k' + i,d = d! + i], 
where d = 2k — 2 and i = d! — 2k' + 2. Furthermore, if C is 
linear, so is C'. Hence, it is sufficient to design an MSR code 
for d = 2k-2. When d = 2k - 2 we have 



and 

B = ka = a(a + 1) . 

We assume that the symbols in data are elements from 
GF(2 m ). Hence, the total data in bits is mB bits for /3 = 1. 

A. Verification for Data-Reconstruction 

Since we need to design codes with Byzantine fault toler- 
ance it is necessary to perform integrity check after the original 
data is reconstructed. Two common verification mechanisms 
can be used: CRC and hash function. Both methods add 
redundancy to the original data before they are encoded. Here 
we adopt CRC since it is simple to implement and requires 
less redundancy. 

CRC uses a cyclic code (CRC code) such that each informa- 
tion sequence can be verified using its generator polynomial 
with degree r, where r is the redundant bits added to the 
information sequence lfl2l . 03). The amount of errors that 
can be detected by a CRC code is related to the number of 
redundant bits. A CRC code with r redundant bits cannot 
detect (7^)100% portion of errors or more. For example, 
when r = 32, the mis-detection error probability is on the 
order of 10~ 10 . Since the size of original data is usually 
large, the redundancy added by imposing a CRC code is 
relatively small. For example, for a [100, 20, 38] MSR code 
with a = 19, B = 19 x 20 = 380, we need to operate on 
GF (2 11 ) such that the total bits for original data are 4180. If 
r = 32, then only 0.77% redundancy is added. Hence, in the 
following, we assume that the CRC checksum has been added 
to the original data and the resultant size is B symbols. 

B. Encoding 

We arrange the information sequence m = 
[mo, mi, . . . , mfl-i] into an information vector U with 
size a x d such that 

{Uji = mfc x for i < j < a 

U(j-a)i = m k 2 for i + a < j < 2a ' 

where k% = (i - l)(a + 1) - i(i + l)/2 + j and k 2 = {a + 
- 1 + a/2) - i(i + l)/2 + (j - a). Let U = [A X A 2 ]. 
From the above construction, A, 's are symmetric matrix with 
dimension a x a for j = 1,2. 

In this encoding, each row of for the information vector U 
produces a codeword of length n. An [n, d = 2a] RS code is 
adopted to construct the MSR code. In particular, for the zth 
row of U, the corresponding codeword is 



a = d-k + l = k-l = d/2 



[Ma^l),^ 1 ),...,^"- 1 )] , 



(9) 
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where pi (x) is a polynomial with all elements in the ith row 
of U as its coefficients, that is, Pi(x) = YljZo u ij a 
a generator of GF(2 m ). In matrix form, we have 



and a is 



U-G = C\ 



where 



G 



(« ) 2 



1\2 



(a°) d (a 1 ) 



l\d-l 



(of 



t-l\d-l 



and C is the codeword vector with dimension (ax n). Finally, 
the ith column of C is distributed to storage node i for 1 < 
i < n. 

The generator matrix G of the RS code can be reformulated 

as 



1 

a° 
(a ) 2 



1 

(a 1 ) 



a 

1\2 



(a°) Q l 
(a )«(a ) 2 



(a°) a (a°) c 
G 

GA 



(a 1 )"* 1 
(a 1 ) Q (a 1 ) 2 

(a 1 )° ! (a 1 ) Q - 1 



1 

(a™- 1 ) 2 



(a"' 1 )"- 1 

(a n - 1 ) 0! a n - 1 
(a n " 1 ) a (a"- 1 ) 2 

(a"- 1 ) Q (a n - 1 ) a - : 



where, G contains the first a rows in G and A is a diagonal 
matrix with (a ) Q , (a 1 )", (a 2 ) Q ,..., (a"" 1 )" as diagonal 
elements. It is easy to see that the a symbols stored in storage 
node i is 

1 = A l9 J + {a l - 1 ) a A 2 gi 



U ■ 



(a 



9i 

i-X\a a T 



where gf is the ith column in G. 

A final remark is that each column in G can be generated 
by knowing the index of the column and the generator a. 
Therefore, each storage node does not need to store the entire 
G to perform exact-regeneration. 

C. Decoding for Data-Reconstruction 

The generator polynomial of the RS code encoded by (|9]l 
has a"~ d , a n ~ , . . . , a as roots lfl2l . Without loss of gen- 
erality, we assume that the data collector retrieves encoded 
symbols from k storage nodes jo, jk-i- First, 

the information sequence m is recovered by the procedure 
given in [|T0| . Note that the procedure in iflOl requires that 
(a°) Q , (a 1 )", (a 2 )",..., (a n ~ 1 ) a all be distinct. This can 
be guaranteed if this code is over GF(2 m ) for m > [log 2 na] . 
If the recovered information sequence does not pass the 
CRC, then we need to perform the error-erasure decoding. 
In addition to the received encoded symbols from k storage 
nodes, the data collector needs to retrieve the encoded symbols 
from d + 2 — k storage nodes of the remaining storage nodes. 



The data collector then performs error-erasure decoding to 
obtain C, the first d columns of the codeword vector. Let G 
be the first d columns of G. Then the recovered information 
sequence can be obtained from 



U = C ■ G~ 



(10) 



where G _1 is the inverse of G and it always exists. If the 
recovered information sequence passes the CRC, it is done; 
otherwise, two more symbols need to be retrieved. The data 
collector continues the decoding process until it successfully 
recover the correct information sequence or no more storage 
nodes can be accessed. In each step, the progressive decoding 
that we proposed in lfl6l is applied to reduce the computation 
complexity. Note that the RS code used is capable of correcting 
up to [(n — d)/2\ errors. 

The decoding algorithm is summarized in Algorithm[T] Note 
that, in practice, Algorithm [T] will be repeated j3 times for each 
retrieved symbol when fi > 1. 

Algorithm 1: Decoding of MSR Codes for Data- 
Reconstruction 



begin 



The data collector randomly chooses k storage nodes 
and retrieves encoded data, Y aX k'< 
Perform the procedure given in IflOl to recover rh; 
if CRCTest(rh) = SUCCESS then 
return rh; 

else 

Retrieve d — k more encoded data from remaining 
storage nodes and merge them into Y aX ^; 
i d; 

while i < n — 2 do 

i <- i + 2; 

Retrieve two more encoded data from 
remaining storage nodes and merge them into 

Y a x i ; 

Perform progressive error-erasure decoding 
on each row in Y to recover C; 
Obtain U by ( TTOb and convert it to rh; 
if CRCTest(rh) = SUCCESS then 
return rh; 

return FAIL; 



D. Verification for Regeneration 

To verify whether the recovered data are the same as those 
stored in the failed node, integrity check is needed. However, 
such check should be performed based on information stored 
on nodes other than the failed node. We consider two mech- 
anisms for verification. 

In this first scheme, each storage node keeps the CRC 
checksums for the rest n — 1 storage nodes. When the helper 
accesses d surviving storage nodes, it also asks for the CRC 
checksums for the failed node from them. Using the majority 
vote on all receiving CRC checksums, the helper can obtain 
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the correct CRC checksum if no more than [(d — 1)/2J 
accessed storage nodes are compromised. To see the storage 
complexity of this scheme, let us take a numerical example. 
Consider a [100, 20, 38] MSR code with a = 19, B = 
4.18MB, (3 = 1000. The total bits stored in each node 
is then 19 x 11 x 1000 = 209000 bits. If a 32-bit CRC 
checksum is added to each storage node, the redundancy is 
r{n - I)/ '(3am = 32 x 99/209000 1.5% and the extra 
bandwidth for transmitting the CRC checksums is around 
rd/(3am = 1216/418000 « 0.3%. Hence, both redundancy 
for storage and bandwidth are manageable for large /3's. 

When (3 is small, we adopt an error-correcting code to 
encode the r-bit CRC checksum. This can improve the storage 
and bandwidth efficiency. First we select the operating finite 
field GF{2 m ') such that 2 m ' > n - 1. Then an [n - 1, k'] RS 
code with k' = \r/m'~\ is used to encode the CRC checksum. 
Note that this code is different from the RS code used for MSR 
data regenerating. In encoding the CRC checksum of a storage 
node into n — 1 symbols and distributing them to the n — 1 
other storage nodes, extra (n — l)m! bits are needed on each 
storage node. When the helper accesses d storage nodes to 
repair the failed node i, these nodes also send out the symbols 
associated with the CRC checksum for node i. The helper 
then can perform error-erasure decoding to recover the CRC 
checksum. The maximum number of compromised storage 
nodes among the accessed d nodes that can be handled by 
this approach is [(d — k')/2\ and the extra bandwidth is dm'. 
Since m! is much smaller than n — 1 and r, the redundancy 
for storage and bandwidth can be reduced. 

E. Decoding for Regeneration 

Let node i be the failed node to be recovered. During 
regeneration, the helper accesses s surviving storage nodes, 
where d < s < n — 1. Without loss of generality, we assume 
that the storage nodes accessed are jo, ji,. . ., j s -i- Every 
accessed node takes the inner product between its a symbols 
and 

fli = [l > (o i - 1 ) 1 ,(a i - 1 ) 2 > ... > (a i - 1 ) '- 1 ] , (11) 

where g i can be generated by index i and the generator 
a, and sends the resultant symbol to the helper. Since the 
MSR code is a linear code, the resultant symbols transmitted, 
Vioi Uji' Vni ■ ■ ■ i Vjs-i> can b e decoded to the codeword c, 
where 

c = g, r (U-G) 
= (9i-U)-G, 

if (n — s) + 2e < n — d + 1, where e is the number of errors 
among the s resultant symbols. Multiplying c by the inverse 
of the first d columns of G, i.e., G~\ one can recover 

Qi'U 

which is equivalent to 



Recall that g i is the transpose of ith column of G, the first 
a rows in G. Since Aj, for j = 1, 2, are symmetric matrices, 
(g^A,) 7 = Ajgf. The a symbols stored in the failed node i 
can then be calculated as 

{ Si A 1 ) T + (a i - 1 ) a ( Si A 3 ) T . (12) 

The progressive decoding procedure in |[T6l can be applied 
in decoding yj , yj 1 , yj 2 , . . . , yj s _ 1 . First, the helper accesses 
d storage nodes and decodes yj , yj lt yj 2 ,..., yj d _ 1 to 
obtain c and a symbols by (fT2l . Then, it verifies the CRC 
checksum. If the CRC check is passed, the regeneration 
is successful; otherwise, two more surviving storage nodes 
need to be accessed. Then the helper decodes the received 
Via i Vji i Vhi - ■ ■ i Vjd+i to obtain c and recover a symbols. 
The process repeats until sufficient number of correctly stored 
data have been retrieved to recover the failed node. Again, in 
practice, when (3 > 1, the decoding needs to be performed 
(3 times to recover (3a symbols before verifying the CRC 
checksum. The data regenerating algorithm is summarized in 
Algorithmic 



Algorithm 2: Decoding of MSR Codes for Regeneration 
begin 

Assume node i is failed. 
The helper randomly chooses d storage nodes; 
Each chosen storage node combines its symbols as a 
((3 x a) matrix and multiply it by g { in (fTTb ; 
The helper collects these resultant vectors as a 
[fi x d) matrix Y. 

The helper obtains the CRC checksum for node i; 

i d; 

repeat 

Perform progressive error-erasure decoding on 
each row in Y to recover C (error-erasure 
decoding performs (3 times); 
M = CG~ l , where G _1 is the inverse of the 
first d columns of G; 

Obtain the (3a information symbols, s, from M 

by the method given in (fT~2b ; 

if CRCTest{s) = SUCCESS then 
| return s; 

else 

% <r- i + 2; 

The helper accesses two more remaining 
storage nodes; 

Each chosen storage node combines its 
symbols as a ((3 x a) matrix and multiply it 
by g { given in (fTTb ; 

The helper merges the resultant vectors into 

Yfixi', 

until i > n — 2; 
return FAIL; 



Q l ■ [Ai A 2 ] = [ 9i -A igi - A 2 
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IV. Encoding and Decoding of Error-Correcting 
Exact-Regenerating Codes for the MBR Points 

In this section we demonstrate that by selecting the same RS 
codes as that for MSR codes and designing a proper decoding 
procedure, the MBR codes in iflOl can be extended to handle 
Byzantine failures. Since the verification procedure for MBR 
codes is the same as that of MSR codes, it is omitted. 

A. Encoding 

Let the information sequence m = [mo, mi, . . . ,7713-1] be 
arranged into an information vector U with size a x d such 
that 

Uji = mfej for i < j < k 

uji — mk 2 f° r k + 1 < i < d, 1 < j < k , 




and 



otherwise 

where k\ =(i - l)(fc+l) -i(i + l)/2 + j and k 2 
l)k + k(k + l)/2 + j. In matrix form, we have 



(i-k- 



U = 



A, Al 



Ao 







(13) 



where A\ is a k x k matrix, A2 a (d — k) x k matrix, is the 
(d— k) x (d — k) zero matrix. Both A± and Ai are symmetric. 
It is clear that U has a dimension d x d (or a x d). 

We apply an [n, d] RS code to encode each row of U. Let 
Pi(x) be the polynomial with all elements in ith row of U as its 
coefficients. That is, Pi(x) — 53j=o u ij& ■ The corresponding 
codeword of Pi{x) is thus 

[ Pl (a° = l) :Pl (a 1 ),...,p l (a n - 1 )} . (14) 

Recall that a is a generator of GF(2 m ). In matrix form, we 
have 

U ■ G = C, 



where 



G 



(o°) 



1 

a 1 

(a 1 ) 



(a°) fe (a 1 ) fc 



(a"- 1 ) 2 
( a «-i)fc-i 



(a 



0\d-l 



(a 1 ! 



d-1 



(a™ 



-l\d-l 



and C is the codeword vector with dimension (a x n). G is 
called the generator matrix of the [n, d] RS code. G can be 
divided into two sub-matrices as 

G k 
D 



G 



where 



G k 



(a ) 2 



1 

a 1 
(a 1 ) 



(a"" 1 ) 5 



(aY" 1 (a 1 ) 



i\fc-i 



(a"" 1 ) 



-i\fc-i 



(15) 
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(a 



1 \fc 



(a 1 ) 



,n—l\k 



i-l\d-l 



(ay- 1 (a 1 )*- 1 ••• (a" 

Note that Gfc is a generator matrix of the [n, k] RS code and it 
will be used in the decoding process for data-reconstruction. 

B. Decoding for Data-Reconstruction 

The generator polynomial of the RS code encoded by (fl3] l 
has a n ~ k , a n ~ , . . . , a as roots IT2l . Hence, the progressive 
decoding scheme given in lfl6l can be applied to decode the 
proposed code if there are errors in the retrieved data. Unlike 
the decoding procedure given in IIII-CI where an [n, d] RS 
decoder is applied, we need an [n, k] RS decoder for MBR 
codes. 

Without loss of generality, we assume that the data 
collector retrieves encoded symbols from s storage nodes 
jo, ji,..., js-x, k < s < n. Recall that a = d in MBR. 
Hence, the data collector receives d vectors where each vector 
has s symbols. Collecting the first k vectors as Y k and the 
remaining d—k vectors as Y^-fe- From ([TBI , we can view the 
codewords in the last d — k rows of C as being encoded by 
Gk instead of G. Hence, the decoding procedure of [n, k] RS 
codes can be applied on Yd-k to recover the codewords in the 
last d — k rows of C. Let Gk be the first k columns of Gk 
and Cd-k be the recovered codewords in the last d — k rows 
of C. A-2 in U can be recovered as 

A 2 = Cd^k ■ G^ 1 . (16) 

We then calculate A\ ■ B and only keep the joth, jith, . . ., 
j s _ith columns of the resultant matrix as E, and subtract E 
from Yk- 



Yl = Y k -E . 



(17) 



Applying the RS decoding algorithm again on we can 
recover A\ as 

-1 



Ai = C k ■ G k 



(18) 



CRC checksum is computed on the decoded information 
sequence to verify the recovered data. If CRC is passed, the 
data reconstruction is successful; otherwise the progressive 
decoding procedure is applied, where two more storage nodes 
need to be accessed from the remaining storage nodes in 
each round until no further errors are detected. The data- 
reconstruction algorithm is summarized in Algorithm [3] 

C. Decoding for Regeneration 

Decoding for regeneration with MBR is very similar to that 
with MSR. After obtaining gi ■ U, we take its transposition. 
Since U is symmetric, we have U T — U and 



U T -gf 



U-gi 



CRC check is performed on all /3a symbols. If the CRC 
check is passed, the (3a symbols are the data stored in the 
failed node; otherwise, the progressive decoding procedure is 
applied. 
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TABLE I 

Evaluation of MSR and MBR codes 





MSR code 


MBR code 




Data-reconstruction 


Regeneration 


Data-reconstruction 


Regeneration 


Fault-tolerant capability against erasures 
Fault-tolerant capacity against Byzantine faults 


n — k 
,n-di 


n — d 
min{L^J,L^J} 


n — k 

^ n — k j 


n — d 
min{L^J,L^J} 


Security strength under forgery attack 


min{fc, r™"? +2 l} 1 


min{d, r n "p + "l} 1 


min{fc, r""^ +2 l} 1 


min{d, \ n -* + ' 2 ]} 1 


Redundancy ratio on storage (bits) 


r 


{n— l)m' 


r 


(n — ljm' 


mka — r 


0am 


m(fcd-fe(fc-l)/2)-r 


/Sam 


Redundancy ratio on bandwidth (bits) 




dm' m 

0md 0m 




dm' m' 

0md 0m 



where k' = L^rJ ar >d m' = \^og 2 (n — 1)] 



Algorithm 3: Decoding of MBR Codes for Data- 
Reconstruction 

begin 

The data collector randomly chooses k storage nodes 
and retrieves encoded data, l^xfe! 
i <— d; 
repeat 

Perform progressive error-erasure decoding on 
last d — k rows in Y to recover C (error-erasure 
decoding performs d — k times); 
Calculate A 2 via dT6k 
Calculate A 2 ■ B and obtain Y£ via ( fTTb ; 
Perform progressive error-erasure decoding on Y^ 
to recover the first k rows in codeword vector 
(error-erasure decoding performs k times); 
Calculate A\ via < fT~8T >; 

Recover the information sequence s from A\ and 

i 2 ; 

if CRCTest(s) = SUCCESS then 
| return s; 
else 

i <- i + 2; 

Retrieve two more encoded data from 
remaining storage nodes and merge them into 

_ Ydxi\ 

until i > n — 2; 
return FAIL; 



V. Analysis 

In this section, we provide an analytical study of the 
fault-tolerant capability, security strength, and storage and 
bandwidth efficiency of the proposed schemes. 

A. Fault-tolerant capability 

In analyzing the fault-tolerant capability, we consider two 
types of failures, namely crash-stop failures and Byzantine 
failures. Nodes are assumed to fail independently (as opposed 
in a coordinated fashion). In both cases, the fault-tolerant 
capacity is measured by the maximum number of failures that 
the system can handle to remain functional. 



Crash-stop failure: Crash-stop failures can be viewed as 
erasure in the codeword. Since at least k nodes need to be 
available for data-reconstruction, it is easy to show that the 
maximum number of crash-stop failures that can be tolerated 
in data-reconstruction is n — k. For regeneration, d nodes need 
to be accessed. Thus, the fault-tolerant capability is n— d. Note 
that since live nodes all contain correct data, CRC checksum 
is also correct. 

Byzantine failure: In general, in RS codes, two additional 
correct code fragments are needed to correct one erroneous 
code fragments. However, in the case of data regeneration, 
the capability of the helper to obtain the correct CRC check- 
sum also matters. In the analysis, we assume that the error- 
correction code is used in the process to obtain the correct 
CRC checksum. Data regeneration will fail if the helper cannot 
obtain the correct CRC checksum even when the number of 
failed nodes is less than the maximum number of faults the 
RS code can handle. Hence, we must take the minimum of 
the capability of the RS code (in MBR and MSR) and the 
capability to recover the correct CRC checksum. Thus, with 
MSR and MBR code, [^J and [^J erroneous nodes 
can be tolerated in data reconstruction. On the other hand, 
the fault-tolerant capacity of MSR and MBR code for data 
regeneration are both min | L^^J , \ d ~ 2 k J }• 

B. Security Strength 

In analyzing the security strength, we consider forgery 
attacks, where polluters [9 |, a type of Byzantine attackers, try 
to disrupt the data-reconstruction and regenerating process by 
forging data cooperatively. In other words, collusion among 
polluters are considered. We want to determine the minimum 
number of polluters to forge the data in data-reconstruction 
and regeneration. The security strength is therefore one less 
the number. Forgery in data regeneration is useful when an 
attacker only has access to a small set of nodes but through the 
data regeneration process "pollutes" the data on other storage 
nodes and thus ultimately leads to valid but erroneous data- 
reconstruction. 

In data-reconstruction, for worst case analysis, we consider 
the security strength such that only one row of U is modified^ 

3 Due to symmetry in U, most of the time, making changes on a row in U 
results in changes on several rows simultaneously. 
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Let the polluters be jo, ji, ■ • ■ ,jv-i> wno can collude to forge 
the information symbols. Suppose that y is the forged row in 
U. Let y = y + u, where u is the real information symbols in 
the row of U. Then, according to the RS encoding procedure, 
we have 

yG = {y + u)G = yG + uG = v + c, (19) 

where c is the original data storage in storage nodes and v is 
the modified data must be made by the polluters. Let the num- 
ber of nonzero symbols in v is h. It is clear that h > n—d+1, 
where n — d+1 is the minimum Hamming distance of the RS 
code, since v must be a codeword. For worst-case considera- 
tion, we assume that h = n — d + 1. In order to successfully 
forge information symbols, the attacker must compromise 
some storage nodes and make them to store the corresponding 
encoded symbols in yG, the codeword corresponding to the 
forged information symbols. If the attacker compromises k 
storage nodes, then when the data collector happens to access 
these compromised storage nodes, according to the decoding 
procedure, the attack can forge the data successfully. Let the 
attacker compromise b < k storage nodes. According the 
decoding procedure, when h — b = n — d + 1 — b < L-^-j^J, 
where L^T^J i s tne error-correction capability of the RS 
code, the decoding algorithm still has chance to decode the 
received vector to yG. Taking the smallest value of b we 
have b = |" n ~^ +2 "| . Hence, the security strength for data- 
reconstruction is min{fc, |~ "~^ +2 ] } — 1 in MSR codes. Since 
the [n, k] RS code is used in decoding for MBR codes, the 
security strength for them becomes min{fc, |~ "~^ +2 ] } — 1. 

Next we investigate the forgery attack on regeneration. Since 
computing the CRC checksum is a linear operation, there is 
no need for the attacker to break the CRC checksum for the 
failed node. It only needs to make the forged data with all zero 
redundant bits. Hence, the security strength for regeneration 
is min{d, - 1. 

It can be observe that CRC does not increase the security 
strength in forgery attack. By using hash value, the security 
strength can be increased since the operation to obtain hash 
value is non-linear. In this case, the attacker not only needs 
to obtain the original information data but also can forge hash 
value. Hence, the security strength can be increased to at least 
k — 1 in data-reconstruction and at least d— 1 for regeneration^ 

C. Redundancy Ratios on Storage and Bandwidth 

CRC checksums incur additional overhead in storage and 
bandwidth consumption. The redundancy incurred for data- 
construction is r bits, the size of CRC checksum. Each infor- 
mation sequence is appended with the extra r bits such that it 
can be verified after reconstruction. The number of information 
bits is mka — r for MSR codes and m(kd — k(k — l)/2) — r 
for MBR codes, respectively. For regeneration, we assume 
that the [re — 1, k'\ RS code is used to distribute the encoded 
CRC symbols to n — 1 storage nodes, where k' = [^p-J and 

4 For regeneration, the security strength is max{d, min{fc', [ g ~*~ 2 ] }} — 
1 = d — 1 since k' is usually less than d. 



m! = [log 2 (n — 1)]. Since each storage node must store the 
encoded CRC symbols for other n — 1 storage nodes, the extra 
storage required for it is (n — l)m' bits. The encoded data 
symbols stored in each storage node is /3am bits. 

The helper must obtain the correct CRC checksum for the 
failed node to verify the correctness of the recovered data. 
The d storage nodes accessed need to provide their stored data 
associated with the CRC checksum of the failed node to the 
helper. Since each piece has m' bits, the total extra bandwidth 
is dm'. The total bandwidth to repair the /3a symbols stored 
in the failed node is f3md. 

Table [J summarizes the quantitative results of fault-tolerate 
capability, security strength, and redundancy ratio of the MSR 
and MBR codes. 

VI. Related Work 

Regenerating codes were introduced in the pioneer works 
by Dimakis et al. in flT), J2). In these works, the so-called 
cut-set bound was derived which is the fundamental limit 
for designing regenerating codes. In these works, the data- 
reconstruction and regeneration problems were formulated as 
a multicast network coding problem. From the cut-set bounds 
between the source and the destination, the parameters of the 
regenerating codes were shown to satisfy ((T), which reveals 
the tradeoff between storage and repair bandwidth. Those 
parameters satisfying the cut-set bound with equality were also 
derived. 

The regeneration codes with parameters satisfying the cut- 
set bound with equality were proposed in ||3l, PI. In ll3l a 
deterministic construction of the generating codes with d = 
n—1 was presented. In 0J, the network coding approach was 
adopted to design the generating codes. Both constructions 
achieved functional regeneration but exact regeneration. 

Exact regeneration was considered in fl5]-||7). In J5), a 
search algorithm was proposed to search for exact-regenerating 
MSR codes with d = n — however, no systematic con- 
struction method was provided. In J6), the MSR codes with 
k = 2, d = n — 1 were constructed by using the concept of 
interference alignment, which was borrowed from the context 
of wireless communications. A drawback of this approach is 
that it operates on a finite field with a large size. In JT), the 
authors provided an explicit method to construct the MBR 
codes with d = n—1. No computation is required for 
these codes during the regeneration of a failed node. Explicit 
construction of the MSR codes with d = k + 1 was also 
provided; however, these codes can perform exact regeneration 
only for a subset of failed storage nodes. 

In ifPTl . the authors proved that exact regeneration is im- 
possible for MSR codes with [n, k, d < 2k — 3] when /3 = 1. 
Based on interference alignment approach, a code construction 
was provided for MSR codes with [n = d+ 1, k, d > 2k — 1]. 
In iflOl . the explicit constructions for optimal MSR codes with 
[n, k,d > 2k — 2] and optimal MBR codes were proposed. 
The construction was based on the product of tow matrices: 
information matrix and encoding matrix. The information 
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matrix (or its submatrices) is symmetric in order to have exact- 
regeneration property. 

The problem of security on regenerating codes were con- 
sidered in (8), 0. In |8), the authors considered the secu- 
rity problem against eavesdropping and adversarial attackers 
during the regeneration process. They derived upper bounds 
on the maximum amount of information that can be stored 
safely. An explicit code construction was given for d = n — 1 
in the bandwidth-limited regime. The problem of Byzantine 
fault tolerance for regenerating codes was considered in |9). 
The authors studied the resilience of regenerating codes which 
support multi-repairs. By using collaboration among new- 
comers (helpers), upper bounds on the resilience capacity of 
regenerating codes were derived. Even though our work also 
deals with the Byzantine failures, it does not need to have 
multiple helpers to recover the failures. 

The progressive decoding technology for distributed storage 
was first introduced in |[T6l . The scheme retrieved just enough 
data from surviving storage nodes to recover the original 
data in the presence of crash-stop and Byzantine failures. 
The decoding was performs incrementally such that both 
communication and computation cost are minimized. 

VII. Conclusions 

In this paper, we considered the problem of exact regen- 
eration with error correction capability for Byzantine fault 
tolerance in distributed storage networks. We showed the 
Reed-Solomon codes combined with CRC checksum can be 
used for both data-reconstruction and regenerating, realizing 
MSR and MBR in the later case. Progressive decoding can 
be applied in both applications to reduce the computation 
complexity in presence of erroneous data. Analysis on the fault 
tolerance, security, storage and bandwidth overhead shows that 
the proposed schemes are effective without incurring too much 
overhead. 
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